TO DO:

La Gpu con 112 livelli è inutilizzabile, possiamo dividere in integrata e dedicata, oppure dividere per le marche.

@Dario: Per la gpu farei innanzitutto una divisione tra integrata e dedicata, e tra quelle dedicate le suddividerei per VRAM (inizialmente pensavo di cercare dei prezzi per avere una stima quantitativa di quanto valga la gpu dedicata, però bisognerebbe cercare prezzi medi e visto che i modelli son tanti mi pare troppo lavoro e non chissà quanto significativo)

Anche la memoria è problematica, ho già estratto la presenza o meno dell’SSD, ma bisogna pensare ad un modo di valutare la memoria.

@Dario: ottimo la divisione SSD/HD, per avere un indicazione quantitativa potremmo cercare il prezzo di un gb di SSD e di un gb di HD da una fonte affidabile, e pesare l’importanza delle dimensioni su questo indicatore, che ne dite?

data <- read.csv("../data/Laptop2.csv")
str(data)
## 'data.frame':    1303 obs. of  17 variables:
##  $ X               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Company         : Factor w/ 19 levels "Acer","Apple",..: 2 2 8 2 2 1 2 2 3 1 ...
##  $ Product         : Factor w/ 618 levels "110-15ACL (A6-7310/4GB/500GB/W10)",..: 302 300 51 302 302 59 302 300 615 431 ...
##  $ TypeName        : Factor w/ 6 levels "2 in 1 Convertible",..: 5 5 4 5 5 4 5 5 5 5 ...
##  $ Inches          : num  13.3 13.3 15.6 15.4 13.3 15.6 15.4 13.3 14 14 ...
##  $ ScreenResolution: Factor w/ 40 levels "1366x768","1440x900",..: 24 2 9 26 24 1 26 2 9 16 ...
##  $ Cpu             : Factor w/ 118 levels "AMD A10-Series 9600P 2.4GHz",..: 55 53 64 75 57 15 74 53 96 73 ...
##  $ Ram             : int  8 8 8 16 8 4 16 8 16 8 ...
##  $ Memory          : Factor w/ 39 levels "1.0TB HDD","1.0TB Hybrid",..: 5 3 17 30 17 27 16 16 30 17 ...
##  $ Gpu             : Factor w/ 110 levels "AMD FirePro W4190M",..: 59 52 54 10 60 18 61 52 98 62 ...
##  $ OpSys           : Factor w/ 9 levels "Android","Chrome OS",..: 5 5 6 5 5 7 4 5 7 7 ...
##  $ Weight          : num  1.37 1.34 1.86 1.83 1.37 2.1 2.04 1.34 1.3 1.6 ...
##  $ Price           : num  1340 899 575 2537 1804 ...
##  $ Frequenza       : num  2.3 1.8 2.5 2.7 3.1 3 2.2 1.8 1.8 1.6 ...
##  $ Risoluzione     : Factor w/ 15 levels "1366x768","1440x900",..: 11 2 4 13 11 1 13 2 4 4 ...
##  $ Pixel           : int  4096000 1296000 2073600 5184000 4096000 1049088 5184000 1296000 2073600 2073600 ...
##  $ SolidStateDisk  : Factor w/ 2 levels "False","True": 2 1 2 2 2 1 1 1 2 2 ...
head(data)
##   X Company     Product  TypeName Inches
## 1 1   Apple MacBook Pro Ultrabook   13.3
## 2 2   Apple Macbook Air Ultrabook   13.3
## 3 3      HP      250 G6  Notebook   15.6
## 4 4   Apple MacBook Pro Ultrabook   15.4
## 5 5   Apple MacBook Pro Ultrabook   13.3
## 6 6    Acer    Aspire 3  Notebook   15.6
##                     ScreenResolution                        Cpu Ram
## 1 IPS Panel Retina Display 2560x1600       Intel Core i5 2.3GHz   8
## 2                           1440x900       Intel Core i5 1.8GHz   8
## 3                  Full HD 1920x1080 Intel Core i5 7200U 2.5GHz   8
## 4 IPS Panel Retina Display 2880x1800       Intel Core i7 2.7GHz  16
## 5 IPS Panel Retina Display 2560x1600       Intel Core i5 3.1GHz   8
## 6                           1366x768    AMD A9-Series 9420 3GHz   4
##                Memory                          Gpu      OpSys Weight
## 1           128GB SSD Intel Iris Plus Graphics 640      macOS   1.37
## 2 128GB Flash Storage       Intel HD Graphics 6000      macOS   1.34
## 3           256GB SSD        Intel HD Graphics 620      No OS   1.86
## 4           512GB SSD           AMD Radeon Pro 455      macOS   1.83
## 5           256GB SSD Intel Iris Plus Graphics 650      macOS   1.37
## 6           500GB HDD                AMD Radeon R5 Windows 10   2.10
##     Price Frequenza Risoluzione   Pixel SolidStateDisk
## 1 1339.69       2.3   2560x1600 4096000           True
## 2  898.94       1.8    1440x900 1296000          False
## 3  575.00       2.5   1920x1080 2073600           True
## 4 2537.45       2.7   2880x1800 5184000           True
## 5 1803.60       3.1   2560x1600 4096000           True
## 6  400.00       3.0    1366x768 1049088          False
summary(data)
##        X             Company                  Product    
##  Min.   :   1.0   Dell   :297   XPS 13            :  30  
##  1st Qu.: 331.5   Lenovo :297   Inspiron 3567     :  29  
##  Median : 659.0   HP     :274   250 G6            :  21  
##  Mean   : 660.2   Asus   :158   Legion Y520-15IKBN:  19  
##  3rd Qu.: 990.5   Acer   :103   Vostro 3568       :  19  
##  Max.   :1320.0   MSI    : 54   Inspiron 5570     :  18  
##                   (Other):120   (Other)           :1167  
##                TypeName       Inches     
##  2 in 1 Convertible:121   Min.   :10.10  
##  Gaming            :205   1st Qu.:14.00  
##  Netbook           : 25   Median :15.60  
##  Notebook          :727   Mean   :15.02  
##  Ultrabook         :196   3rd Qu.:15.60  
##  Workstation       : 29   Max.   :18.40  
##                                          
##                                   ScreenResolution
##  Full HD 1920x1080                        :507    
##  1366x768                                 :281    
##  IPS Panel Full HD 1920x1080              :230    
##  IPS Panel Full HD / Touchscreen 1920x1080: 53    
##  Full HD / Touchscreen 1920x1080          : 47    
##  1600x900                                 : 23    
##  (Other)                                  :162    
##                           Cpu           Ram        
##  Intel Core i5 7200U 2.5GHz :190   Min.   : 2.000  
##  Intel Core i7 7700HQ 2.8GHz:146   1st Qu.: 4.000  
##  Intel Core i7 7500U 2.7GHz :134   Median : 8.000  
##  Intel Core i7 8550U 1.8GHz : 73   Mean   : 8.382  
##  Intel Core i5 8250U 1.6GHz : 72   3rd Qu.: 8.000  
##  Intel Core i5 6200U 2.3GHz : 68   Max.   :64.000  
##  (Other)                    :620                   
##                   Memory                         Gpu     
##  256GB SSD           :412   Intel HD Graphics 620  :281  
##  1TB HDD             :223   Intel HD Graphics 520  :185  
##  500GB HDD           :132   Intel UHD Graphics 620 : 68  
##  512GB SSD           :118   Nvidia GeForce GTX 1050: 66  
##  128GB SSD +  1TB HDD: 94   Nvidia GeForce GTX 1060: 48  
##  128GB SSD           : 76   Nvidia GeForce 940MX   : 43  
##  (Other)             :248   (Other)                :612  
##         OpSys          Weight          Price        Frequenza    
##  Windows 10:1072   Min.   :0.690   Min.   : 174   Min.   :0.900  
##  No OS     :  66   1st Qu.:1.500   1st Qu.: 599   1st Qu.:2.000  
##  Linux     :  62   Median :2.040   Median : 977   Median :2.500  
##  Windows 7 :  45   Mean   :2.039   Mean   :1124   Mean   :2.299  
##  Chrome OS :  27   3rd Qu.:2.300   3rd Qu.:1488   3rd Qu.:2.700  
##  macOS     :  13   Max.   :4.700   Max.   :6099   Max.   :3.600  
##  (Other)   :  18                                                 
##     Risoluzione      Pixel         SolidStateDisk
##  1920x1080:841   Min.   :1049088   False:460     
##  1366x768 :308   1st Qu.:1440000   True :843     
##  3840x2160: 43   Median :2073600                 
##  3200x1800: 27   Mean   :2168807                 
##  1600x900 : 23   3rd Qu.:2073600                 
##  2560x1440: 23   Max.   :8294400                 
##  (Other)  : 38
nums <- sapply(data, is.numeric)
var_numeric <- data[,nums] 
head(var_numeric)
##   X Inches Ram Weight   Price Frequenza   Pixel
## 1 1   13.3   8   1.37 1339.69       2.3 4096000
## 2 2   13.3   8   1.34  898.94       1.8 1296000
## 3 3   15.6   8   1.86  575.00       2.5 2073600
## 4 4   15.4  16   1.83 2537.45       2.7 5184000
## 5 5   13.3   8   1.37 1803.60       3.1 4096000
## 6 6   15.6   4   2.10  400.00       3.0 1049088
data$Weight<-as.numeric(data$Weight)
data$Ram<-as.numeric(data$Ram)
sapply(data, function(x)(sum(is.na(x))))
##                X          Company          Product         TypeName 
##                0                0                0                0 
##           Inches ScreenResolution              Cpu              Ram 
##                0                0                0                0 
##           Memory              Gpu            OpSys           Weight 
##                0                0                0                0 
##            Price        Frequenza      Risoluzione            Pixel 
##                0                0                0                0 
##   SolidStateDisk 
##                0
# Non ci sono missing data!
plot(data$Company,data$Price)

class(data$Ram)
## [1] "numeric"
plot(density(data$Frequenza))

#hist(data$Price, breaks=25, probability=TRUE)
#lines(density(data$Price))
library(ggplot2)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
ggplot(data,aes(x = Price)) + 
        geom_histogram(aes(y =..density..),
                       bins= 25,
                       fill = "grey",
                       color ="black") +
        geom_vline(xintercept = quantile(data$Price, 0.50), color = "dark red", lty = 2) +
        geom_vline(xintercept = mean(data$Price), color = "dark blue", lty = 2) +
        labs(x = "Price", y ="Density") +
        ggtitle("Price Distribution with mean and median") +
          geom_density()

Quite skewed to the right, mean > media

We could try to apply a correction like Log(Y)

data$LogPrice=log(data$Price)
ggplot(data,aes(x = log(Price))) + 
        geom_histogram(aes(y =..density..),
                       bins= 25,
                       fill = "grey",
                       color ="black") +
        geom_vline(xintercept = quantile(data$LogPrice, 0.50), color = "dark red", lty = 2) +
        geom_vline(xintercept = mean(data$LogPrice), color = "dark blue", lty = 2) +
        labs(x = "log(Price)", y ="Density") +
        ggtitle("log(Price) Distribution with mean and median")+  geom_density()

Now the distribution is looking a bit better (as regards normality)

ggplot(data,aes(x = Price)) + 
        geom_histogram(aes(y =..density..),
                       bins= 25,
                       fill = "grey",
                       color ="black") +
        geom_vline(xintercept = mean(data$Price), color = "dark red") +
        geom_vline(xintercept = mean(data$Price) + sd(data$Price), color = "dark red", lty = 2) +
        geom_vline(xintercept = mean(data$Price) - sd(data$Price), color = "dark red", lty = 2) +
        labs(x = "Price", y ="Density") +
        ggtitle("Price Distribution (mean +/- sd)") +
          geom_density()

ggplot(data,aes(x = log(Price))) + 
        geom_histogram(aes(y =..density..),
                       bins= 25,
                       fill = "grey",
                       color ="black") +
        geom_vline(xintercept = mean(data$LogPrice), color = "dark red") +
        geom_vline(xintercept = mean(data$LogPrice) + sd(data$LogPrice), color = "dark red", lty = 2) +
        geom_vline(xintercept = mean(data$LogPrice) - sd(data$LogPrice), color = "dark red", lty = 2) +
        labs(x = "log(Price)", y ="Density") +
        ggtitle("log(Price) Distribution (mean +/- sd)") +
          geom_density()

ggplot(data,aes(x = Price)) + 
        geom_histogram(aes(y =..density..),
                       bins= 25,
                       fill = "grey",
                       color ="black") +
        geom_vline(xintercept = quantile(data$Price, 0.25), color = "dark red",lty = 2) +
        geom_vline(xintercept = quantile(data$Price, 0.5), color = "dark red", ) +
        geom_vline(xintercept = quantile(data$Price, 0.75), color = "dark red", lty = 2) +
        labs(x = "Price", y ="Density") +
        ggtitle("Price Distribution (quartiles)") +
          geom_density()

ggplot(data,aes(x = log(Price))) + 
        geom_histogram(aes(y =..density..),
                       bins= 25,
                       fill = "grey",
                       color ="black") +
        geom_vline(xintercept = quantile(data$LogPrice, 0.25), color = "dark red",lty = 2) +
        geom_vline(xintercept = quantile(data$LogPrice, 0.5), color = "dark red", ) +
        geom_vline(xintercept = quantile(data$LogPrice, 0.75), color = "dark red", lty = 2) +
           labs(x = "log(Price)", y ="Density") +
        ggtitle("log(Price) Distribution (quartiles)") +
          geom_density()

Descrittive variabile dipendente price

ggplot(data, aes(x = Price, fill = TypeName)) + 
        geom_density(size = 0.6, alpha = .3) +
        labs(x = "Price", y ="Density", fill = "TypeName") +
        ggtitle("Price Density Distribution For TypeName") 

ggplot(data, aes(x = log(Price), fill = TypeName)) + 
        geom_density(size = 0.6, alpha = .3) +
        labs(x = "log(Price)", y ="Density", fill = "TypeName") +
        ggtitle("log(Price) Density Distribution For TypeName") 

ggplot(data, aes(x = Price, fill = SolidStateDisk)) + 
        geom_density(size = 0.6, alpha = .3) +
        labs(x = "Price", y ="Density", fill = "SolidStateDisk") +
        ggtitle("Price Density Distribution For SolidStateDisk") 

ggplot(data, aes(x = log(Price), fill = SolidStateDisk)) + 
        geom_density(size = 0.6, alpha = .3) +
        labs(x = "log(Price)", y ="Density", fill = "SolidStateDisk") +
        ggtitle("log(Price) Density Distribution For SolidStateDisk") 

library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
describe(data$Price)
##    vars    n    mean     sd median trimmed    mad min  max range skew
## X1    1 1303 1123.69 699.01    977 1038.47 619.73 174 6099  5925 1.52
##    kurtosis    se
## X1     4.34 19.36
library(nortest)
# NORMALITA'

boxplot(data$Price)

qqnorm(data$Price);qqline(data$Price)

shapiro.test(data$Price)
## 
##  Shapiro-Wilk normality test
## 
## data:  data$Price
## W = 0.89382, p-value < 2.2e-16
ad.test(data$Price)
## 
##  Anderson-Darling normality test
## 
## data:  data$Price
## A = 28.319, p-value < 2.2e-16
#wilcox.test(data$Price, conf.int = TRUE, mu = ) #worth it?
#if(!require(Envstats)) install.packages("EnvStats")
library(EnvStats)
## 
## Attaching package: 'EnvStats'
## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm
## The following object is masked from 'package:base':
## 
##     print.default
varTest(sample(data$Price), sigma.squared = (sd(data$Price)*sd(data$Price)))
## 
##  Chi-Squared Test on Variance
## 
## data:  sample(data$Price)
## Chi-Squared = 1302, df = 1302, p-value = 0.9896
## alternative hypothesis: true variance is not equal to 488613.6
## 95 percent confidence interval:
##  453149.5 528432.0
## sample estimates:
## variance 
## 488613.6

Trying with the log correction:

# Correzione NORMALITA'
library(nortest)
boxplot(data$LogPrice)

qqnorm(data$LogPrice);qqline(data$LogPrice)

shapiro.test(data$LogPrice) #better than before, but still not normal according to shapiro
## 
##  Shapiro-Wilk normality test
## 
## data:  data$LogPrice
## W = 0.99252, p-value = 3.628e-06
ad.test(data$LogPrice)
## 
##  Anderson-Darling normality test
## 
## data:  data$LogPrice
## A = 2.5942, p-value = 1.515e-06

T-test

# One sample
ref <- mean(data$Price)
Apple<-data$Price[data$Company=="Apple"]
t.test(Apple,mu=ref,alternative = "greater")
## 
##  One Sample t-test
## 
## data:  Apple
## t = 3.5944, df = 20, p-value = 0.000906
## alternative hypothesis: true mean is greater than 1123.687
## 95 percent confidence interval:
##  1352.823      Inf
## sample estimates:
## mean of x 
##  1564.199
# Wilcoxon Signed Rank Test
wilcox.test(Apple, mu=ref, conf.int = TRUE)
## 
##  Wilcoxon signed rank test
## 
## data:  Apple
## V = 206, p-value = 0.0008516
## alternative hypothesis: true location is not equal to 1123.687
## 95 percent confidence interval:
##  1234.50 1829.26
## sample estimates:
## (pseudo)median 
##       1514.275
#Two sample
Other <-data$Price[data$Company!="Apple"]
wilcox.test(Apple, Other, alternative = "g")
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Apple and Other
## W = 19689, p-value = 0.0001358
## alternative hypothesis: true location shift is greater than 0
# F test sulla varianza
var.test(Apple, Other, alternative = "two.sided")
## 
##  F test to compare two variances
## 
## data:  Apple and Other
## F = 0.64574, num df = 20, denom df = 1281, p-value = 0.2401
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.3755878 1.3509884
## sample estimates:
## ratio of variances 
##          0.6457382

Variabili qualitative: tabella di contingenza e chi quadro

b<-data
b.table<-table(b$SolidStateDisk,b$TypeName)
b.table
##        
##         2 in 1 Convertible Gaming Netbook Notebook Ultrabook Workstation
##   False                 26     27      13      372        16           6
##   True                  95    178      12      355       180          23
prop.table(b.table,2)
##        
##         2 in 1 Convertible     Gaming    Netbook   Notebook  Ultrabook
##   False         0.21487603 0.13170732 0.52000000 0.51169188 0.08163265
##   True          0.78512397 0.86829268 0.48000000 0.48830812 0.91836735
##        
##         Workstation
##   False  0.20689655
##   True   0.79310345
# chi square test 

chisq.test(b.table)
## 
##  Pearson's Chi-squared test
## 
## data:  b.table
## X-squared = 203.18, df = 5, p-value < 2.2e-16
chi=chisq.test(b.table)
chi_norm=chi$statistic/(nrow(b)*min(nrow(b.table)-1,ncol(b.table)-1))
chi_norm
## X-squared 
## 0.1559288
summary(b.table)
## Number of cases in table: 1303 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 203.18, df = 5, p-value = 5.944e-42

Correlazione per variabili quantitative

# seleziona solo variabili quantitative
nums <- sapply(data, is.numeric)
var_numeric <- data[,nums] 
head(var_numeric)
##   X Inches Ram Weight   Price Frequenza   Pixel LogPrice
## 1 1   13.3   8   1.37 1339.69       2.3 4096000 7.200194
## 2 2   13.3   8   1.34  898.94       1.8 1296000 6.801216
## 3 3   15.6   8   1.86  575.00       2.5 2073600 6.354370
## 4 4   15.4  16   1.83 2537.45       2.7 5184000 7.838915
## 5 5   13.3   8   1.37 1803.60       3.1 4096000 7.497540
## 6 6   15.6   4   2.10  400.00       3.0 1049088 5.991465
var_numeric$X=NULL
# Matrice di correlazione
R<-cor(var_numeric)            
R
##                Inches       Ram      Weight      Price Frequenza
## Inches     1.00000000 0.2379928  0.82763110 0.06819667 0.3078698
## Ram        0.23799280 1.0000000  0.38387409 0.74300714 0.3680005
## Weight     0.82763110 0.3838741  1.00000000 0.21036980 0.3204336
## Price      0.06819667 0.7430071  0.21036980 1.00000000 0.4302931
## Frequenza  0.30786980 0.3680005  0.32043359 0.43029310 1.0000000
## Pixel     -0.08639917 0.3963585 -0.04403379 0.51548639 0.1352935
## LogPrice   0.04432871 0.6848033  0.15167383 0.92758068 0.5041461
##                 Pixel   LogPrice
## Inches    -0.08639917 0.04432871
## Ram        0.39635848 0.68480333
## Weight    -0.04403379 0.15167383
## Price      0.51548639 0.92758068
## Frequenza  0.13529350 0.50414608
## Pixel      1.00000000 0.48490475
## LogPrice   0.48490475 1.00000000
# Test di correlazione. (Spearsman's o Kendall tau)
cor.test(var_numeric$Inches, var_numeric$Weight)
## 
##  Pearson's product-moment correlation
## 
## data:  var_numeric$Inches and var_numeric$Weight
## t = 53.187, df = 1301, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8097181 0.8440031
## sample estimates:
##       cor 
## 0.8276311
#corrgram(var_numeric)

# Correlazione come grafo
library(qgraph)
## Registered S3 methods overwritten by 'huge':
##   method    from   
##   plot.sim  BDgraph
##   print.sim BDgraph
detcor=cor(as.matrix(var_numeric), method="pearson")
round(detcor, 2)
##           Inches  Ram Weight Price Frequenza Pixel LogPrice
## Inches      1.00 0.24   0.83  0.07      0.31 -0.09     0.04
## Ram         0.24 1.00   0.38  0.74      0.37  0.40     0.68
## Weight      0.83 0.38   1.00  0.21      0.32 -0.04     0.15
## Price       0.07 0.74   0.21  1.00      0.43  0.52     0.93
## Frequenza   0.31 0.37   0.32  0.43      1.00  0.14     0.50
## Pixel      -0.09 0.40  -0.04  0.52      0.14  1.00     0.48
## LogPrice    0.04 0.68   0.15  0.93      0.50  0.48     1.00
# plot corr matrix: green positive red negative 
qgraph(detcor, shape="circle", posCol="darkgreen", negCol="darkred")

Boxplot di confronto (pre-anova)

boxplot(data$Price~data$Company, 
        main="Boxplot Prezzo per compagnia", 
        col= rainbow(6), 
        horizontal = F)

boxplot(data$Price~data$SolidStateDisk, 
        main="Prezzo vs ssd", 
        col= rainbow(2), 
        horizontal = F)

ANOVA

A una via

lmA = lm(Price ~ SolidStateDisk, data=data)
summary(lmA)
## 
## Call:
## lm(formula = Price ~ SolidStateDisk, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1214.8  -343.3   -96.8   261.1  4710.2 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          637.86      27.98   22.80   <2e-16 ***
## SolidStateDiskTrue   750.93      34.78   21.59   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 600 on 1301 degrees of freedom
## Multiple R-squared:  0.2638, Adjusted R-squared:  0.2632 
## F-statistic: 466.2 on 1 and 1301 DF,  p-value: < 2.2e-16
drop1(lmA, test = 'F')
## Single term deletions
## 
## Model:
## Price ~ SolidStateDisk
##                Df Sum of Sq       RSS   AIC F value    Pr(>F)    
## <none>                      468355897 16672                      
## SolidStateDisk  1 167819064 636174961 17069  466.17 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lmA)
## Analysis of Variance Table
## 
## Response: Price
##                  Df    Sum Sq   Mean Sq F value    Pr(>F)    
## SolidStateDisk    1 167819064 167819064  466.17 < 2.2e-16 ***
## Residuals      1301 468355897    359997                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(lsmeans)
## Loading required package: emmeans
## The 'lsmeans' package is now basically a front end for 'emmeans'.
## Users are encouraged to switch the rest of the way.
## See help('transition') for more information, including how to
## convert old 'lsmeans' objects and scripts to work with 'emmeans'.
ls_SolidStateDisk = lsmeans(lmA,pairwise ~ SolidStateDisk,adjust = 'tukey')
ls_SolidStateDisk$contrasts 
##  contrast     estimate   SE   df t.ratio p.value
##  False - True     -751 34.8 1301 -21.591 <.0001
ls_SolidStateDisk$lsmeans
##  SolidStateDisk lsmean   SE   df lower.CL upper.CL
##  False             638 28.0 1301      583      693
##  True             1389 20.7 1301     1348     1429
## 
## Confidence level used: 0.95
plot(ls_SolidStateDisk$lsmeans, alpha = .05)

lmB = lm(Price ~ Company, data=data)
summary(lmB)
## 
## Call:
## lm(formula = Price ~ Company, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2317.1  -452.8  -127.4   288.5  3812.6 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        626.78      63.43   9.881  < 2e-16 ***
## CompanyApple       937.42     154.14   6.082 1.57e-09 ***
## CompanyAsus        477.39      81.53   5.856 6.03e-09 ***
## CompanyChuwi      -312.48     377.06  -0.829 0.407416    
## CompanyDell        559.29      73.62   7.597 5.80e-14 ***
## CompanyFujitsu     102.22     377.06   0.271 0.786352    
## CompanyGoogle     1050.89     377.06   2.787 0.005397 ** 
## CompanyHP          441.00      74.41   5.927 3.96e-09 ***
## CompanyHuawei      797.22     459.62   1.735 0.083065 .  
## CompanyLenovo      459.61      73.62   6.243 5.81e-10 ***
## CompanyLG         1472.22     377.06   3.904 9.93e-05 ***
## CompanyMediacom   -331.78     251.46  -1.319 0.187270    
## CompanyMicrosoft   985.53     270.37   3.645 0.000278 ***
## CompanyMSI        1102.13     108.16  10.190  < 2e-16 ***
## CompanyRazer      2719.37     251.46  10.814  < 2e-16 ***
## CompanySamsung     786.67     223.77   3.515 0.000454 ***
## CompanyToshiba     641.04     112.51   5.698 1.50e-08 ***
## CompanyVero       -409.35     328.08  -1.248 0.212365    
## CompanyXiaomi      506.69     328.08   1.544 0.122740    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 643.8 on 1284 degrees of freedom
## Multiple R-squared:  0.1635, Adjusted R-squared:  0.1518 
## F-statistic: 13.94 on 18 and 1284 DF,  p-value: < 2.2e-16
drop1(lmB, test = 'F')
## Single term deletions
## 
## Model:
## Price ~ Company
##         Df Sum of Sq       RSS   AIC F value    Pr(>F)    
## <none>               532160971 16873                      
## Company 18 104013991 636174961 17069  13.943 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lmB)
## Analysis of Variance Table
## 
## Response: Price
##             Df    Sum Sq Mean Sq F value    Pr(>F)    
## Company     18 104013991 5778555  13.943 < 2.2e-16 ***
## Residuals 1284 532160971  414456                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ls_Company = lsmeans(lmB,pairwise ~ Company,adjust = 'tukey')
#ls_Company$contrasts #too long to be printed
ls_Company$lsmeans
##  Company   lsmean    SE   df lower.CL upper.CL
##  Acer         627  63.4 1284  502.331      751
##  Apple       1564 140.5 1284 1288.594     1840
##  Asus        1104  51.2 1284 1003.692     1205
##  Chuwi        314 371.7 1284 -414.885     1043
##  Dell        1186  37.4 1284 1112.783     1259
##  Fujitsu      729 371.7 1284   -0.182     1458
##  Google      1678 371.7 1284  948.485     2407
##  HP          1068  38.9 1284  991.475     1144
##  Huawei      1424 455.2 1284  530.938     2317
##  Lenovo      1086  37.4 1284 1013.099     1160
##  LG          2099 371.7 1284 1369.818     2828
##  Mediacom     295 243.3 1284 -182.362      772
##  Microsoft   1612 262.8 1284 1096.699     2128
##  MSI         1729  87.6 1284 1557.038     1901
##  Razer       3346 243.3 1284 2868.781     3824
##  Samsung     1413 214.6 1284  992.451     1834
##  Toshiba     1268  92.9 1284 1085.517     1450
##  Vero         217 321.9 1284 -414.065      849
##  Xiaomi      1133 321.9 1284  501.972     1765
## 
## Confidence level used: 0.95
plot(ls_Company$lsmeans, alpha = .05)

lmC = lm(Price ~ TypeName, data=data)
summary(lmC)
## 
## Call:
## lm(formula = Price ~ TypeName, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1049.2  -381.7   -98.1   267.6  4367.6 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1282.40      50.01  25.642  < 2e-16 ***
## TypeNameGaming        448.98      63.07   7.119 1.79e-12 ***
## TypeNameNetbook      -646.17     120.86  -5.347 1.06e-07 ***
## TypeNameNotebook     -500.32      54.01  -9.263  < 2e-16 ***
## TypeNameUltrabook     265.83      63.60   4.180 3.12e-05 ***
## TypeNameWorkstation   997.96     113.74   8.774  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 550.1 on 1297 degrees of freedom
## Multiple R-squared:  0.383,  Adjusted R-squared:  0.3806 
## F-statistic:   161 on 5 and 1297 DF,  p-value: < 2.2e-16
drop1(lmC, test = 'F')
## Single term deletions
## 
## Model:
## Price ~ TypeName
##          Df Sum of Sq       RSS   AIC F value    Pr(>F)    
## <none>                392518380 16450                      
## TypeName  5 243656581 636174961 17069  161.02 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lmC)
## Analysis of Variance Table
## 
## Response: Price
##             Df    Sum Sq  Mean Sq F value    Pr(>F)    
## TypeName     5 243656581 48731316  161.02 < 2.2e-16 ***
## Residuals 1297 392518380   302636                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ls_TypeName = lsmeans(lmC,pairwise ~ TypeName,adjust = 'tukey')
ls_TypeName$contrasts 
##  contrast                         estimate    SE   df t.ratio p.value
##  2 in 1 Convertible - Gaming          -449  63.1 1297  -7.119 <.0001 
##  2 in 1 Convertible - Netbook          646 120.9 1297   5.347 <.0001 
##  2 in 1 Convertible - Notebook         500  54.0 1297   9.263 <.0001 
##  2 in 1 Convertible - Ultrabook       -266  63.6 1297  -4.180 0.0004 
##  2 in 1 Convertible - Workstation     -998 113.7 1297  -8.774 <.0001 
##  Gaming - Netbook                     1095 116.5 1297   9.397 <.0001 
##  Gaming - Notebook                     949  43.5 1297  21.821 <.0001 
##  Gaming - Ultrabook                    183  55.0 1297   3.333 0.0114 
##  Gaming - Workstation                 -549 109.1 1297  -5.030 <.0001 
##  Netbook - Notebook                   -146 111.9 1297  -1.303 0.7833 
##  Netbook - Ultrabook                  -912 116.8 1297  -7.806 <.0001 
##  Netbook - Workstation               -1644 150.1 1297 -10.951 <.0001 
##  Notebook - Ultrabook                 -766  44.3 1297 -17.304 <.0001 
##  Notebook - Workstation              -1498 104.2 1297 -14.383 <.0001 
##  Ultrabook - Workstation              -732 109.5 1297  -6.689 <.0001 
## 
## P value adjustment: tukey method for comparing a family of 6 estimates
ls_TypeName$lsmeans
##  TypeName           lsmean    SE   df lower.CL upper.CL
##  2 in 1 Convertible   1282  50.0 1297     1184     1381
##  Gaming               1731  38.4 1297     1656     1807
##  Netbook               636 110.0 1297      420      852
##  Notebook              782  20.4 1297      742      822
##  Ultrabook            1548  39.3 1297     1471     1625
##  Workstation          2280 102.2 1297     2080     2481
## 
## Confidence level used: 0.95
plot(ls_TypeName$lsmeans, alpha = .05)

library(coefplot)
#library(forestmodel)
coefplot(lmC, intercept = FALSE)

par(mfrow = c(2,2))
plot(lmC)

#(not) normal distribution of residuals
par(mfrow=c(1,2))
boxplot(lmC$residuals)
qqnorm(lmC$residuals);qqline(lmC$residuals)

ad.test(lmC$residuals)
## 
##  Anderson-Darling normality test
## 
## data:  lmC$residuals
## A = 22.667, p-value < 2.2e-16
shapiro.test(lmC$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  lmC$residuals
## W = 0.89641, p-value < 2.2e-16
#let's try again with the log correction
lmC_log = lm(log(Price) ~ TypeName, data=data)
summary(lmC_log)#R^2 increases
## 
## Call:
## lm(formula = log(Price) ~ TypeName, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.40971 -0.33589  0.00698  0.33215  1.96853 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          7.02648    0.04379 160.456  < 2e-16 ***
## TypeNameGaming       0.33865    0.05522   6.133 1.15e-09 ***
## TypeNameNetbook     -0.91149    0.10583  -8.613  < 2e-16 ***
## TypeNameNotebook    -0.49823    0.04729 -10.534  < 2e-16 ***
## TypeNameUltrabook    0.26648    0.05569   4.785 1.91e-06 ***
## TypeNameWorkstation  0.66479    0.09959   6.675 3.65e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4817 on 1297 degrees of freedom
## Multiple R-squared:  0.4061, Adjusted R-squared:  0.4038 
## F-statistic: 177.4 on 5 and 1297 DF,  p-value: < 2.2e-16
drop1(lmC_log, test = 'F')
## Single term deletions
## 
## Model:
## log(Price) ~ TypeName
##          Df Sum of Sq    RSS     AIC F value    Pr(>F)    
## <none>                300.95 -1897.5                      
## TypeName  5    205.76 506.71 -1228.7  177.36 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lmC_log)
## Analysis of Variance Table
## 
## Response: log(Price)
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## TypeName     5 205.76  41.152  177.36 < 2.2e-16 ***
## Residuals 1297 300.95   0.232                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ls_TypeName_log = lsmeans(lmC_log,pairwise ~ TypeName,adjust = 'tukey')
ls_TypeName_log$contrasts 
##  contrast                         estimate     SE   df t.ratio p.value
##  2 in 1 Convertible - Gaming       -0.3387 0.0552 1297  -6.133 <.0001 
##  2 in 1 Convertible - Netbook       0.9115 0.1058 1297   8.613 <.0001 
##  2 in 1 Convertible - Notebook      0.4982 0.0473 1297  10.534 <.0001 
##  2 in 1 Convertible - Ultrabook    -0.2665 0.0557 1297  -4.785 <.0001 
##  2 in 1 Convertible - Workstation  -0.6648 0.0996 1297  -6.675 <.0001 
##  Gaming - Netbook                   1.2501 0.1020 1297  12.251 <.0001 
##  Gaming - Notebook                  0.8369 0.0381 1297  21.970 <.0001 
##  Gaming - Ultrabook                 0.0722 0.0481 1297   1.500 0.6644 
##  Gaming - Workstation              -0.3261 0.0956 1297  -3.413 0.0087 
##  Netbook - Notebook                -0.4133 0.0980 1297  -4.218 0.0004 
##  Netbook - Ultrabook               -1.1780 0.1023 1297 -11.515 <.0001 
##  Netbook - Workstation             -1.5763 0.1315 1297 -11.990 <.0001 
##  Notebook - Ultrabook              -0.7647 0.0388 1297 -19.725 <.0001 
##  Notebook - Workstation            -1.1630 0.0912 1297 -12.750 <.0001 
##  Ultrabook - Workstation           -0.3983 0.0958 1297  -4.156 0.0005 
## 
## Results are given on the log (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 6 estimates
ls_TypeName_log$lsmeans
##  TypeName           lsmean     SE   df lower.CL upper.CL
##  2 in 1 Convertible   7.03 0.0438 1297     6.94     7.11
##  Gaming               7.37 0.0336 1297     7.30     7.43
##  Netbook              6.11 0.0963 1297     5.93     6.30
##  Notebook             6.53 0.0179 1297     6.49     6.56
##  Ultrabook            7.29 0.0344 1297     7.23     7.36
##  Workstation          7.69 0.0894 1297     7.52     7.87
## 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95
plot(ls_TypeName_log$lsmeans, alpha = .05)

coefplot(lmC_log, intercept = FALSE)

par(mfrow = c(2,2))
plot(lmC_log)

#(not) normal distribution of residuals
par(mfrow=c(1,2))
boxplot(lmC_log$residuals)
qqnorm(lmC_log$residuals);qqline(lmC_log$residuals)

ad.test(lmC_log$residuals) #normal now!
## 
##  Anderson-Darling normality test
## 
## data:  lmC_log$residuals
## A = 0.51757, p-value = 0.1886
shapiro.test(lmC_log$residuals) #borderline now!
## 
##  Shapiro-Wilk normality test
## 
## data:  lmC_log$residuals
## W = 0.99764, p-value = 0.05462

A due vie

# Con interazione
lmC = lm(Price ~ Company*TypeName  , data=data)
drop1(lmC, test="F")
## Single term deletions
## 
## Model:
## Price ~ Company * TypeName
##                  Df Sum of Sq       RSS   AIC F value    Pr(>F)    
## <none>                        320739568 16273                      
## Company:TypeName 25  29159364 349898932 16336  4.5602 1.181e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#summary(lmC) #FIXME: too long to be printed

lmC = lm(Price ~ Company+TypeName  , data=data)
# type I effects A, B/A   C/A,B  
anova(lmC)
## Analysis of Variance Table
## 
## Response: Price
##             Df    Sum Sq  Mean Sq F value    Pr(>F)    
## Company     18 104013991  5778555  21.123 < 2.2e-16 ***
## TypeName     5 182262038 36452408 133.246 < 2.2e-16 ***
## Residuals 1279 349898932   273572                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# type III effects A/B,C , B/A,C   C/A,B
drop1(lmC, test="F")
## Single term deletions
## 
## Model:
## Price ~ Company + TypeName
##          Df Sum of Sq       RSS   AIC  F value    Pr(>F)    
## <none>                349898932 16336                       
## Company  18  42619448 392518380 16450   8.6549 < 2.2e-16 ***
## TypeName  5 182262038 532160971 16873 133.2460 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lmC)
## 
## Call:
## lm(formula = Price ~ Company + TypeName, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2147.6  -343.2   -81.9   243.1  4081.9 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           991.52      69.88  14.189  < 2e-16 ***
## CompanyApple          383.70     132.62   2.893  0.00388 ** 
## CompanyAsus           168.81      67.79   2.490  0.01290 *  
## CompanyChuwi         -180.68     306.47  -0.590  0.55559    
## CompanyDell           350.73      60.52   5.796 8.56e-09 ***
## CompanyFujitsu        234.02     306.47   0.764  0.44525    
## CompanyGoogle         497.17     309.44   1.607  0.10837    
## CompanyHP             337.48      60.85   5.546 3.55e-08 ***
## CompanyHuawei         243.50     375.96   0.648  0.51731    
## CompanyLenovo         322.12      60.24   5.348 1.05e-07 ***
## CompanyLG             918.50     309.44   2.968  0.00305 ** 
## CompanyMediacom      -270.91     204.43  -1.325  0.18534    
## CompanyMicrosoft      431.81     223.95   1.928  0.05406 .  
## CompanyMSI            311.10      98.62   3.155  0.00165 ** 
## CompanyRazer         1996.14     207.24   9.632  < 2e-16 ***
## CompanySamsung        438.88     183.82   2.388  0.01710 *  
## CompanyToshiba        601.45      92.12   6.529 9.52e-11 ***
## CompanyVero          -277.55     266.70  -1.041  0.29821    
## CompanyXiaomi         295.72     267.40   1.106  0.26896    
## TypeNameGaming        426.29      65.51   6.507 1.10e-10 ***
## TypeNameNetbook      -600.94     115.75  -5.192 2.42e-07 ***
## TypeNameNotebook     -496.54      51.98  -9.552  < 2e-16 ***
## TypeNameUltrabook     188.98      63.81   2.962  0.00312 ** 
## TypeNameWorkstation   948.46     109.22   8.684  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 523 on 1279 degrees of freedom
## Multiple R-squared:   0.45,  Adjusted R-squared:  0.4401 
## F-statistic:  45.5 on 23 and 1279 DF,  p-value: < 2.2e-16
# contrasti
library(lsmeans)
ls=lsmeans(lmC, #FIXME: @Andrea, c'era lmB ma credo tu volessi scrivere lmC, in case check it
           pairwise ~ TypeName ,
           adjust="tukey")
ls$lsmeans
##  TypeName           lsmean    SE   df lower.CL upper.CL
##  2 in 1 Convertible   1350  68.9 1279     1214     1485
##  Gaming               1776  64.4 1279     1649     1902
##  Netbook               749 115.9 1279      521      976
##  Notebook              853  52.0 1279      751      955
##  Ultrabook            1538  55.5 1279     1430     1647
##  Workstation          2298 110.1 1279     2082     2514
## 
## Results are averaged over the levels of: Company 
## Confidence level used: 0.95
# plot lsmeans and 95% confid int
plot(ls$lsmeans, alpha = .05)

# contrasts between predicted lsmeans
ls$contrasts
##  contrast                         estimate    SE   df t.ratio p.value
##  2 in 1 Convertible - Gaming          -426  65.5 1279  -6.507 <.0001 
##  2 in 1 Convertible - Netbook          601 115.7 1279   5.192 <.0001 
##  2 in 1 Convertible - Notebook         497  52.0 1279   9.552 <.0001 
##  2 in 1 Convertible - Ultrabook       -189  63.8 1279  -2.962 0.0367 
##  2 in 1 Convertible - Workstation     -948 109.2 1279  -8.684 <.0001 
##  Gaming - Netbook                     1027 114.5 1279   8.972 <.0001 
##  Gaming - Notebook                     923  49.4 1279  18.671 <.0001 
##  Gaming - Ultrabook                    237  61.1 1279   3.882 0.0015 
##  Gaming - Workstation                 -522 108.3 1279  -4.820 <.0001 
##  Netbook - Notebook                   -104 107.0 1279  -0.975 0.9258 
##  Netbook - Ultrabook                  -790 113.3 1279  -6.969 <.0001 
##  Netbook - Workstation               -1549 143.8 1279 -10.774 <.0001 
##  Notebook - Ultrabook                 -686  46.5 1279 -14.754 <.0001 
##  Notebook - Workstation              -1445  99.8 1279 -14.475 <.0001 
##  Ultrabook - Workstation              -759 106.4 1279  -7.138 <.0001 
## 
## Results are averaged over the levels of: Company 
## P value adjustment: tukey method for comparing a family of 6 estimates
# if at least one contrast is significant, the variable
# is significant in the anova table # drop1 effects

# contrast among predicted lsmeans and overall lsmean
c= contrast(ls, method = "eff")
c
## $lsmeans
##  contrast                  estimate   SE   df t.ratio p.value
##  2 in 1 Convertible effect    -77.7 47.9 1279  -1.623 0.1048 
##  Gaming effect                348.6 46.0 1279   7.583 <.0001 
##  Netbook effect              -678.6 90.2 1279  -7.521 <.0001 
##  Notebook effect             -574.2 31.8 1279 -18.032 <.0001 
##  Ultrabook effect             111.3 43.8 1279   2.542 0.0134 
##  Workstation effect           870.7 84.6 1279  10.287 <.0001 
## 
## Results are averaged over the levels of: Company 
## P value adjustment: fdr method for 6 tests 
## 
## $contrasts
##  contrast                                estimate    SE   df t.ratio
##  2 in 1 Convertible - Gaming effect        -150.6  71.6 1279  -2.103
##  2 in 1 Convertible - Netbook effect        876.6 121.9 1279   7.192
##  2 in 1 Convertible - Notebook effect       772.2  51.2 1279  15.077
##  2 in 1 Convertible - Ultrabook effect       86.7  57.9 1279   1.498
##  2 in 1 Convertible - Workstation effect   -672.8  74.0 1279  -9.093
##  Gaming - Netbook effect                   1302.9 123.6 1279  10.544
##  Gaming - Notebook effect                  1198.5  55.4 1279  21.649
##  Gaming - Ultrabook effect                  513.0  60.9 1279   8.416
##  Gaming - Workstation effect               -246.5  77.4 1279  -3.186
##  Netbook - Notebook effect                  171.2 107.0 1279   1.600
##  Netbook - Ultrabook effect                -514.3 110.5 1279  -4.655
##  Netbook - Workstation effect             -1273.7 119.6 1279 -10.649
##  Notebook - Ultrabook effect               -409.9  55.3 1279  -7.416
##  Notebook - Workstation effect            -1169.3  71.6 1279 -16.325
##  Ultrabook - Workstation effect            -483.8  84.4 1279  -5.730
##  p.value
##  0.0411 
##  <.0001 
##  <.0001 
##  0.1345 
##  <.0001 
##  <.0001 
##  <.0001 
##  <.0001 
##  0.0018 
##  0.1177 
##  <.0001 
##  <.0001 
##  <.0001 
##  <.0001 
##  <.0001 
## 
## Results are averaged over the levels of: Company 
## P value adjustment: fdr method for 15 tests
library(coefplot)
coefplot(lmC, intercept=FALSE) #FIXME: @Andrea, same goes here

ANOVA k way

lmK = lm(Price ~ Company+TypeName+SolidStateDisk  , data=data)
summary(lmK)
## 
## Call:
## lm(formula = Price ~ Company + TypeName + SolidStateDisk, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2113.2  -301.6   -49.8   210.4  3862.4 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           689.90      67.44  10.230  < 2e-16 ***
## CompanyApple          514.90     122.54   4.202 2.83e-05 ***
## CompanyAsus           130.26      62.53   2.083  0.03744 *  
## CompanyChuwi          -16.66     282.67  -0.059  0.95300    
## CompanyDell           280.98      55.97   5.020 5.88e-07 ***
## CompanyFujitsu         84.49     282.64   0.299  0.76505    
## CompanyGoogle         404.40     285.26   1.418  0.15653    
## CompanyHP             272.28      56.25   4.840 1.45e-06 ***
## CompanyHuawei         150.73     346.56   0.435  0.66368    
## CompanyLenovo         235.35      55.81   4.217 2.65e-05 ***
## CompanyLG             825.73     285.26   2.895  0.00386 ** 
## CompanyMediacom      -356.00     188.50  -1.889  0.05918 .  
## CompanyMicrosoft      339.04     206.50   1.642  0.10087    
## CompanyMSI            178.78      91.32   1.958  0.05047 .  
## CompanyRazer         1868.90     191.19   9.775  < 2e-16 ***
## CompanySamsung        387.48     169.45   2.287  0.02238 *  
## CompanyToshiba        436.72      85.60   5.102 3.87e-07 ***
## CompanyVero          -113.54     246.04  -0.461  0.64456    
## CompanyXiaomi          96.18     246.80   0.390  0.69680    
## TypeNameGaming        398.61      60.40   6.599 6.05e-11 ***
## TypeNameNetbook      -473.92     107.01  -4.429 1.03e-05 ***
## TypeNameNotebook     -358.93      48.77  -7.360 3.28e-13 ***
## TypeNameUltrabook     113.04      59.02   1.915  0.05570 .  
## TypeNameWorkstation   946.96     100.66   9.407  < 2e-16 ***
## SolidStateDiskTrue    470.33      31.17  15.089  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 482.1 on 1278 degrees of freedom
## Multiple R-squared:  0.5332, Adjusted R-squared:  0.5244 
## F-statistic: 60.82 on 24 and 1278 DF,  p-value: < 2.2e-16
drop1(lmK, test="F") # type III SS
## Single term deletions
## 
## Model:
## Price ~ Company + TypeName + SolidStateDisk
##                Df Sum of Sq       RSS   AIC  F value    Pr(>F)    
## <none>                      296988657 16125                       
## Company        18  33990309 330978966 16230   8.1259 < 2.2e-16 ***
## TypeName        5 109128253 406116910 16523  93.9200 < 2.2e-16 ***
## SolidStateDisk  1  52910275 349898932 16336 227.6832 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coefplot(lmK, intercept=FALSE)

Regressione lineare

lmA<-lm(Price ~ Frequenza  , data=data)
summary(lmA)
## 
## Call:
## lm(formula = Price ~ Frequenza, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1467.6  -453.8  -119.6   327.6  4618.2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -241.84      81.32  -2.974    0.003 ** 
## Frequenza     594.02      34.55  17.194   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 631.2 on 1301 degrees of freedom
## Multiple R-squared:  0.1852, Adjusted R-squared:  0.1845 
## F-statistic: 295.6 on 1 and 1301 DF,  p-value: < 2.2e-16
plot(data$Frequenza,data$Price)
abline(lmA,col="red")

lmA<-lm(Price ~ Frequenza+Pixel+Ram  , data=data)
summary(lmA)
## 
## Call:
## lm(formula = Price ~ Frequenza + Pixel + Ram, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1785.72  -257.23   -66.06   191.11  2791.53 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.076e+02  5.547e+01  -7.349 3.52e-13 ***
## Frequenza    2.549e+02  2.474e+01  10.306  < 2e-16 ***
## Pixel        1.329e-04  9.117e-06  14.575  < 2e-16 ***
## Ram          7.839e+01  2.658e+00  29.488  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 420.2 on 1299 degrees of freedom
## Multiple R-squared:  0.6395, Adjusted R-squared:  0.6386 
## F-statistic:   768 on 3 and 1299 DF,  p-value: < 2.2e-16
coefplot(lmA, intercept=FALSE)

ANCOVA

lmK = lm(Price ~ Company+TypeName+SolidStateDisk+ Frequenza+Pixel+Ram  , data=data)
summary(lmK)
## 
## Call:
## lm(formula = Price ~ Company + TypeName + SolidStateDisk + Frequenza + 
##     Pixel + Ram, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1838.5  -211.8   -28.2   169.3  1894.4 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -1.491e+02  6.691e+01  -2.229  0.02602 *  
## CompanyApple         2.826e+02  9.043e+01   3.125  0.00182 ** 
## CompanyAsus          5.438e+01  4.587e+01   1.185  0.23609    
## CompanyChuwi        -7.683e+01  2.082e+02  -0.369  0.71213    
## CompanyDell          1.124e+02  4.132e+01   2.720  0.00662 ** 
## CompanyFujitsu       5.168e+01  2.071e+02   0.250  0.80294    
## CompanyGoogle        3.062e+02  2.105e+02   1.455  0.14602    
## CompanyHP            2.045e+02  4.134e+01   4.947 8.54e-07 ***
## CompanyHuawei        5.510e+01  2.539e+02   0.217  0.82822    
## CompanyLenovo        1.260e+02  4.108e+01   3.066  0.00221 ** 
## CompanyLG            6.759e+02  2.090e+02   3.235  0.00125 ** 
## CompanyMediacom     -1.108e+02  1.392e+02  -0.796  0.42603    
## CompanyMicrosoft     2.369e+02  1.515e+02   1.564  0.11807    
## CompanyMSI           2.046e+02  6.686e+01   3.061  0.00225 ** 
## CompanyRazer         1.085e+03  1.428e+02   7.594 5.95e-14 ***
## CompanySamsung       9.436e+01  1.246e+02   0.757  0.44896    
## CompanyToshiba       2.871e+02  6.306e+01   4.553 5.79e-06 ***
## CompanyVero          1.440e+01  1.811e+02   0.080  0.93663    
## CompanyXiaomi       -1.743e+01  1.808e+02  -0.096  0.92322    
## TypeNameGaming      -2.977e+01  4.812e+01  -0.619  0.53621    
## TypeNameNetbook     -1.142e+02  7.947e+01  -1.437  0.15105    
## TypeNameNotebook    -2.440e+02  3.642e+01  -6.700 3.11e-11 ***
## TypeNameUltrabook    9.405e+01  4.338e+01   2.168  0.03034 *  
## TypeNameWorkstation  7.172e+02  7.500e+01   9.562  < 2e-16 ***
## SolidStateDiskTrue   1.997e+02  2.432e+01   8.212 5.28e-16 ***
## Frequenza            1.701e+02  2.320e+01   7.335 3.94e-13 ***
## Pixel                8.315e-05  8.292e-06  10.028  < 2e-16 ***
## Ram                  6.541e+01  2.578e+00  25.368  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 352.9 on 1275 degrees of freedom
## Multiple R-squared:  0.7504, Adjusted R-squared:  0.7452 
## F-statistic:   142 on 27 and 1275 DF,  p-value: < 2.2e-16
drop1(lmK, .~., test="F")
## Single term deletions
## 
## Model:
## Price ~ Company + TypeName + SolidStateDisk + Frequenza + Pixel + 
##     Ram
##                Df Sum of Sq       RSS   AIC  F value    Pr(>F)    
## <none>                      158760389 15315                       
## Company        18  13404444 172164833 15384   5.9806 4.092e-14 ***
## TypeName        5  35077529 193837917 15565  56.3413 < 2.2e-16 ***
## SolidStateDisk  1   8397143 167157532 15380  67.4372 5.281e-16 ***
## Frequenza       1   6698755 165459144 15367  53.7975 3.940e-13 ***
## Pixel           1  12521049 171281438 15412 100.5562 < 2.2e-16 ***
## Ram             1  80130237 238890626 15845 643.5236 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ls=lsmeans(lmK,
           pairwise ~ Company ,
           adjust="tukey")
c= contrast(ls, method = "eff")
#c #FIXME: too long to be printed
data$LogPrice=NULL
data$Product=NULL
data$X=NULL
str(data)
## 'data.frame':    1303 obs. of  15 variables:
##  $ Company         : Factor w/ 19 levels "Acer","Apple",..: 2 2 8 2 2 1 2 2 3 1 ...
##  $ TypeName        : Factor w/ 6 levels "2 in 1 Convertible",..: 5 5 4 5 5 4 5 5 5 5 ...
##  $ Inches          : num  13.3 13.3 15.6 15.4 13.3 15.6 15.4 13.3 14 14 ...
##  $ ScreenResolution: Factor w/ 40 levels "1366x768","1440x900",..: 24 2 9 26 24 1 26 2 9 16 ...
##  $ Cpu             : Factor w/ 118 levels "AMD A10-Series 9600P 2.4GHz",..: 55 53 64 75 57 15 74 53 96 73 ...
##  $ Ram             : num  8 8 8 16 8 4 16 8 16 8 ...
##  $ Memory          : Factor w/ 39 levels "1.0TB HDD","1.0TB Hybrid",..: 5 3 17 30 17 27 16 16 30 17 ...
##  $ Gpu             : Factor w/ 110 levels "AMD FirePro W4190M",..: 59 52 54 10 60 18 61 52 98 62 ...
##  $ OpSys           : Factor w/ 9 levels "Android","Chrome OS",..: 5 5 6 5 5 7 4 5 7 7 ...
##  $ Weight          : num  1.37 1.34 1.86 1.83 1.37 2.1 2.04 1.34 1.3 1.6 ...
##  $ Price           : num  1340 899 575 2537 1804 ...
##  $ Frequenza       : num  2.3 1.8 2.5 2.7 3.1 3 2.2 1.8 1.8 1.6 ...
##  $ Risoluzione     : Factor w/ 15 levels "1366x768","1440x900",..: 11 2 4 13 11 1 13 2 4 4 ...
##  $ Pixel           : int  4096000 1296000 2073600 5184000 4096000 1049088 5184000 1296000 2073600 2073600 ...
##  $ SolidStateDisk  : Factor w/ 2 levels "False","True": 2 1 2 2 2 1 1 1 2 2 ...
lm_full = lm(Price ~ ., data = data)
#summary(lm_full) #FIXME: wayyy too long to be printed, R^2 =0.9586
anova(lm_full, test="F")
## Analysis of Variance Table
## 
## Response: Price
##                    Df    Sum Sq  Mean Sq  F value    Pr(>F)    
## Company            18 104013991  5778555 114.5882 < 2.2e-16 ***
## TypeName            5 182262038 36452408 722.8478 < 2.2e-16 ***
## Inches              1   6163570  6163570 122.2230 < 2.2e-16 ***
## ScreenResolution   36 108074619  3002073  59.5308 < 2.2e-16 ***
## Cpu               110  95329933   866636  17.1853 < 2.2e-16 ***
## Ram                 1  34947028 34947028 692.9963 < 2.2e-16 ***
## Memory             35  17134540   489558   9.7079 < 2.2e-16 ***
## Gpu                88  34242874   389124   7.7163 < 2.2e-16 ***
## OpSys               6   3526085   587681  11.6537 1.198e-12 ***
## Weight              1       973      973   0.0193    0.8895    
## Residuals        1001  50479311    50429                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(lm_full, test="F")
## Single term deletions
## 
## Model:
## Price ~ Company + TypeName + Inches + ScreenResolution + Cpu + 
##     Ram + Memory + Gpu + OpSys + Weight + Frequenza + Risoluzione + 
##     Pixel + SolidStateDisk
##                  Df Sum of Sq      RSS   AIC F value    Pr(>F)    
## <none>                        50479311 14370                      
## Company          14   6197922 56677232 14493  8.7789 < 2.2e-16 ***
## TypeName          5   3685931 54165241 14452 14.6183  7.50e-14 ***
## Inches            1    210134 50689445 14373  4.1669   0.04148 *  
## ScreenResolution 23   5322877 55802188 14454  4.5892  8.09e-12 ***
## Cpu              88  16408116 66887427 14560  3.6974 < 2.2e-16 ***
## Ram               1   4481351 54960662 14479 88.8648 < 2.2e-16 ***
## Memory           34  10507055 60986365 14548  6.1281 < 2.2e-16 ***
## Gpu              88  30459868 80939179 14809  6.8638 < 2.2e-16 ***
## OpSys             6   3518495 53997806 14446 11.6286  1.28e-12 ***
## Weight            1       973 50480284 14368  0.0193   0.88953    
## Frequenza         0         0 50479311 14370                      
## Risoluzione       0         0 50479311 14370                      
## Pixel             0         0 50479311 14370                      
## SolidStateDisk    0         0 50479311 14370                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#coefplot(lm_full, intercept=FALSE) #meglio di no ahah

par(mfrow=c(2,2))
plot(lm_full)
## Warning: not plotting observations with leverage one:
##   13, 15, 18, 29, 34, 46, 84, 128, 160, 173, 178, 179, 205, 232, 267, 271, 299, 303, 324, 348, 388, 436, 438, 448, 457, 458, 466, 475, 501, 518, 520, 561, 564, 593, 611, 666, 671, 684, 689, 698, 702, 703, 718, 719, 724, 750, 756, 768, 777, 780, 781, 784, 808, 811, 817, 826, 827, 840, 848, 855, 902, 912, 922, 945, 946, 951, 954, 965, 977, 990, 1018, 1053, 1067, 1076, 1082, 1087, 1089, 1111, 1117, 1118, 1132, 1136, 1137, 1155, 1192, 1201, 1208, 1235

## Warning: not plotting observations with leverage one:
##   13, 15, 18, 29, 34, 46, 84, 128, 160, 173, 178, 179, 205, 232, 267, 271, 299, 303, 324, 348, 388, 436, 438, 448, 457, 458, 466, 475, 501, 518, 520, 561, 564, 593, 611, 666, 671, 684, 689, 698, 702, 703, 718, 719, 724, 750, 756, 768, 777, 780, 781, 784, 808, 811, 817, 826, 827, 840, 848, 855, 902, 912, 922, 945, 946, 951, 954, 965, 977, 990, 1018, 1053, 1067, 1076, 1082, 1087, 1089, 1111, 1117, 1118, 1132, 1136, 1137, 1155, 1192, 1201, 1208, 1235
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

par(mfrow=c(1,1))
par(mfrow=c(1,2))
boxplot(lm_full$residuals)
qqnorm(lm_full$residuals);qqline(lm_full$residuals) # probably the correction would work pretty fine here

#tests
ad.test(lm_full$residuals)
## 
##  Anderson-Darling normality test
## 
## data:  lm_full$residuals
## A = 19.821, p-value < 2.2e-16
shapiro.test(lm_full$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  lm_full$residuals
## W = 0.94917, p-value < 2.2e-16
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:EnvStats':
## 
##     boxcox
boxcoxreg1<-boxcox(lm_full)

which.max(boxcoxreg1$y)
## [1] 48
lambda=boxcoxreg1$x[which.max(boxcoxreg1$y)]
lambda
## [1] -0.1010101
lm_full_t = lm(log(Price) ~ ., data = data)

par(mfrow=c(2,2))
plot(lm_full_t) #quite better
## Warning: not plotting observations with leverage one:
##   13, 15, 18, 29, 34, 46, 84, 128, 160, 173, 178, 179, 205, 232, 267, 271, 299, 303, 324, 348, 388, 436, 438, 448, 457, 458, 466, 475, 501, 518, 520, 561, 564, 593, 611, 666, 671, 684, 689, 698, 702, 703, 718, 719, 724, 750, 756, 768, 777, 780, 781, 784, 808, 811, 817, 826, 827, 840, 848, 855, 902, 912, 922, 945, 946, 951, 954, 965, 977, 990, 1018, 1053, 1067, 1076, 1082, 1087, 1089, 1111, 1117, 1118, 1132, 1136, 1137, 1155, 1192, 1201, 1208, 1235

## Warning: not plotting observations with leverage one:
##   13, 15, 18, 29, 34, 46, 84, 128, 160, 173, 178, 179, 205, 232, 267, 271, 299, 303, 324, 348, 388, 436, 438, 448, 457, 458, 466, 475, 501, 518, 520, 561, 564, 593, 611, 666, 671, 684, 689, 698, 702, 703, 718, 719, 724, 750, 756, 768, 777, 780, 781, 784, 808, 811, 817, 826, 827, 840, 848, 855, 902, 912, 922, 945, 946, 951, 954, 965, 977, 990, 1018, 1053, 1067, 1076, 1082, 1087, 1089, 1111, 1117, 1118, 1132, 1136, 1137, 1155, 1192, 1201, 1208, 1235
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

ad.test(lm_full_t$residuals) #not really
## 
##  Anderson-Darling normality test
## 
## data:  lm_full_t$residuals
## A = 7.5169, p-value < 2.2e-16
shapiro.test(lm_full_t$residuals)  #not really
## 
##  Shapiro-Wilk normality test
## 
## data:  lm_full_t$residuals
## W = 0.98478, p-value = 1.874e-10

A look over outliers

library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:EnvStats':
## 
##     qqPlot
## The following object is masked from 'package:psych':
## 
##     logit
influencePlot(lm_full,main="Influence Plot", sub="Circle size is proportial to Cook's Distance" )

##        StudRes        Hat       CookD
## 13         NaN 1.00000000         NaN
## 15         NaN 1.00000000         NaN
## 434   2.124383 0.67519240 0.030955546
## 849   4.607995 0.04198279 0.003020118
## 1020  5.093056 0.04545388 0.003990584
## 1081 -4.381193 0.40114343 0.041814955
#Cook's Distance
cooksd <- cooks.distance(lm_full_t)
cooksda=data.frame(cooksd)
summary(cooksd)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00000 0.00003 0.00013 0.00076 0.00053 0.04815      88
# identify D values > 4/(n-k-1)
# Cook's D plot
cutoff <- 4/((nrow(data)-length(lm_full_t$coefficients)-2))
plot(lm_full_t, which=4, cook.levels=cutoff)

plot(cooksd, pch="*", cex=1, main="Influential Obs by Cooks distance") # plot cook's distance
abline(h = cutoff, col="red") # add cutoff line
text(x=1:length(cooksd)+1, y=cooksd, labels=ifelse(cooksd>4*mean(cooksd, na.rm=T),names(cooksd),""),
col="red")#add labels

#extract influencial obs
influential <- as.numeric(names(cooksd)[(cooksd > cutoff)]) # influential row numbers
influ=data.frame(data[cooksd > cutoff, ])
filtered_data <- data[ !(row.names(data) %in% c(influential)), ]
#Outlier rimossi
lm_full_t_no_OUTliers = lm(log(Price) ~ ., data = filtered_data)
par(mfrow=c(2,2))
plot(lm_full_t_no_OUTliers)
## Warning: not plotting observations with leverage one:
##   15, 21, 32, 43, 148, 172, 196, 222, 288, 292, 312, 335, 408, 420, 430, 438, 439, 447, 456, 481, 485, 498, 500, 541, 544, 572, 589, 648, 651, 661, 666, 667, 679, 693, 694, 699, 724, 730, 742, 746, 751, 755, 758, 765, 791, 800, 801, 807, 814, 822, 829, 836, 863, 874, 883, 893, 914, 915, 919, 922, 932, 943, 944, 945, 957, 984, 1017, 1031, 1039, 1044, 1049, 1051, 1066, 1073, 1074, 1078, 1079, 1093, 1097, 1098, 1116, 1153, 1162, 1169, 1195, 1210

## Warning: not plotting observations with leverage one:
##   15, 21, 32, 43, 148, 172, 196, 222, 288, 292, 312, 335, 408, 420, 430, 438, 439, 447, 456, 481, 485, 498, 500, 541, 544, 572, 589, 648, 651, 661, 666, 667, 679, 693, 694, 699, 724, 730, 742, 746, 751, 755, 758, 765, 791, 800, 801, 807, 814, 822, 829, 836, 863, 874, 883, 893, 914, 915, 919, 922, 932, 943, 944, 945, 957, 984, 1017, 1031, 1039, 1044, 1049, 1051, 1066, 1073, 1074, 1078, 1079, 1093, 1097, 1098, 1116, 1153, 1162, 1169, 1195, 1210
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

#summary(lm_full_t_no_OUTliers) #FIXME: too long to be printed, R^2=0.9727
ncvTest(lm_full_t_no_OUTliers)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 1.740444, Df = 1, p = 0.18708
null = lm(log(Price) ~ 1, data = filtered_data)
full = lm(log(Price) ~ ., data = filtered_data)
lm_fit = stepAIC(null, scope = list(upper = full), direction = "both", trace = FALSE)
drop1(lm_fit, test = 'F')
## Single term deletions
## 
## Model:
## log(Price) ~ Cpu + Memory + OpSys + Gpu + TypeName + ScreenResolution + 
##     Company + Ram + Inches
##                  Df Sum of Sq    RSS     AIC F value    Pr(>F)    
## <none>                        32.161 -4053.8                      
## Cpu              84   15.5056 47.667 -3724.9  5.5789 < 2.2e-16 ***
## Memory           34    9.9754 42.136 -3780.6  8.8673 < 2.2e-16 ***
## OpSys             5    5.9181 38.079 -3850.5 35.7726 < 2.2e-16 ***
## Gpu              85   10.7290 42.890 -3860.2  3.8149 < 2.2e-16 ***
## TypeName          4    2.1668 34.328 -3979.5 16.3717 5.483e-13 ***
## ScreenResolution 28    4.7325 36.893 -3936.4  5.1082 5.348e-16 ***
## Company          14    3.0807 35.242 -3966.3  6.6506 4.801e-13 ***
## Ram               1    1.5499 33.711 -3996.4 46.8425 1.360e-11 ***
## Inches            1    1.0393 33.200 -4015.7 31.4100 2.720e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1